This document outlines the R code that was used across the lectures for Advanced Statistics course at MSc in Psychological Research Methods programme at the University of Sheffield, UK. The code follows the lectures, but also expands on the content covered by the lecture.

Lecture 1

Introduction

First, we used the dataset that are included in R packages. We can see what data do we have in standard R packages by calling: data() or we can see datasets specific for certain R packages by calling: data(package=‘lme4’).

In this case, we can call pre-loaded dataset cars. More information on this data is available by calling ?cars or help(cars).

data("cars")  #calls the dataset
head(cars)  # prints first 6 observations in the dataset
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

We can use scatter plot to see the raw values.

plot(cars$speed, cars$dist, xlab = "Predictor", ylab = "Outcome")  #As this plot was only used to make an illustration of the linear regression, we changed the labels of x and y-axis. This was done using xlab and ylab parameters. You can change the name of the labels using the same code, eg. xlab='Speed (mph)', ylab='Stopping distance (ft)'
abline(lm(dist ~ speed, data = cars), lwd = 2)  #abline function adds one or more straight lines through the plot that you have active in your environment. lm function is used to fit linear models, where dist is modelled as a function of speed. We also indicate that values for distance and speed can be found in the cars dataset (data = cars). Finally, we specify the thickness of abline function with lwd=2 parameter. 

Data simulation

We can also simulate some data:

set.seed(456)  # we can set starting numbers that are used to generate our random values. Random numbers are rarely truly random: https://engineering.mit.edu/engage/ask-an-engineer/can-a-computer-generate-a-truly-random-number/
Babies = data.frame(Age = round(runif(100, 1, 30)), Weight = rnorm(100, 4000, 500))  #We create a new data frame (Babies) where we have Age and Weight as variables. 100 Age values are sampled from random uniform distribution (runif) with lower bound of 1 (minimum) and upper bound of 30 (maximum). 100 Weight values are generated using random normal distribution (rnorm) with mean of 4000 and SD of 500 
Babies$Height = rnorm(100, 40 + 0.2 * Babies$Age + 0.004 * Babies$Weight, 5)  # Height is generated using random normal distribution where mean is a function of Age and Weight, while SD is 5. 
Babies$Gender = factor(rbinom(100, 1, 0.5))  # 100 Gender values are generated using random binomial function with equal probability of being assigned one or the other sex category
levels(Babies$Gender) = c("Girls", "Boys")  #We levels function to assign Girls and Boys labels to 1 and 0 levels generated by the function

We can plot and inspect raw data:

par(mfrow = c(1, 3), bty = "n", mar = c(5, 4, 0.1, 0.1), cex = 1.1, pch = 16)  # par parameter sets global plotting settings. mfrow indicates number of panels for plots (1 row and 3 columns), bty sets type of box around the plot, mar defines margines around the plot, cex magnifies size of labels, while pch sets type of points used in the plot.
plot(Babies$Age, Babies$Height, xlab = "Age (months)", ylab = "Height (cm)")
plot(Babies$Weight, Babies$Height, xlab = "Weight (grams)", ylab = "Height (cm)")
boxplot(Babies$Height ~ Babies$Gender, xlab = "Gender", ylab = "Height (cm)")

We can also check the coding and formatting of the variables in our data frame.

str(Babies)
## 'data.frame':    100 obs. of  4 variables:
##  $ Age   : num  4 7 22 26 24 11 3 9 8 12 ...
##  $ Weight: num  3668 3872 4339 4448 4309 ...
##  $ Height: num  61.5 56.1 59.3 55.2 60.3 ...
##  $ Gender: Factor w/ 2 levels "Girls","Boys": 1 1 2 2 2 2 1 1 1 1 ...

Finally, we can also load the dataset from our local disc drives. The datasets should be placed in your working environment. There are a number of tutorials that guide you how to optimise creation of the new projects, which results in all of your data, code and files being generated and saved at one place. https://r4ds.had.co.nz/workflow-projects.html

inequality <- read.table("inequality.txt", sep = "\t", header = T)  # read a file in table format and create a data frame from it. sep parameter defines separator in the data - tab delimited format in this case, while it could be also anything else, such as ',' - comma separated values, ' ' - space separated values etc. Header = T defines that the names of the variables are in the first row of the database.

# You can also numerous other extensions, suchs spss sav files, however you
# will often need additional packages for this.  Foreign package -
# library(foreign) has read.spss function that can be used to read in spss
# files dataExample<-read.spss('Example.sav', to.data.frame=T)

Linear regression

One predictor

lm1 <- lm(Height ~ Age, data = Babies)  # linear model where Height is modelled as a function of Age. 
lm1$coefficients  # We can print only the coefficients
## (Intercept)         Age 
##   57.025804    0.143174
summary(lm1$coefficients)  #We can also part of the information that is frequently used to judge significance and importance of predictors  
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1432 14.3638 28.5845 28.5845 42.8051 57.0258

Two predictors

lm2 <- lm(Height ~ Age + Gender, data = Babies)  # linear model where Height is analysed as a function of Age and Gender
lm2$coefficients
## (Intercept)         Age  GenderBoys 
##  57.5038799   0.1403955  -0.8309449

Main effects and interaction (numerical x categorical)

lm3 <- lm(Height ~ Age * Gender, data = Babies)  # linear model where Height is modelled as a function of Age and Gender, as well as their interaction
lm3$coefficients
##    (Intercept)            Age     GenderBoys Age:GenderBoys 
##     56.1448603      0.2202400      1.8315868     -0.1607307

Main effects and interaction (numerical x numerical)

lm4 <- lm(Height ~ Age * Weight, data = Babies)
lm4$coefficients
##   (Intercept)           Age        Weight    Age:Weight 
## 30.7244904854  0.6913148965  0.0066360329 -0.0001397745

Other information that we get from the linear model

lm1 <- lm(Height ~ Age, data = Babies)
summary(lm1)
## 
## Call:
## lm(formula = Height ~ Age, data = Babies)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.4765  -4.1601  -0.3703   3.9198  12.3842 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 57.02580    1.18751  48.021   <2e-16 ***
## Age          0.14317    0.06426   2.228   0.0282 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.283 on 98 degrees of freedom
## Multiple R-squared:  0.04821,    Adjusted R-squared:  0.0385 
## F-statistic: 4.964 on 1 and 98 DF,  p-value: 0.02817

We can calculate R2 step-by-step. First, we need predictions from the model that includes predictor, as this will give us residual sum of squares.

Babies$lm1 = predict(lm1, newdata = Babies)  # predict Height based on our lm1 model.
Babies$diff = Babies$lm1 - Babies$Height  #calculate differences between predicted (lm1) and observed values (Height) 

We can plot these differences using ggplot.

require(ggplot2)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.1.1
ggplot() + geom_linerange(data = Babies, aes(x = Age, ymin = Height, ymax = lm1,
    colour = diff), size = 1.2) + geom_line(data = Babies, aes(x = Age, y = lm1),
    size = 1.2) + geom_point(data = Babies, aes(Age, y = Height), size = 2) + ylab("Height") +
    xlab("Age") + ggtitle("SS_residual")

A second ingredient for our determination coefficient is sum of squares of total variation in the data. This we can get by building model with intercept only.

lm0 <- lm(Height ~ 1, data = Babies)
summary(lm0)
## 
## Call:
## lm(formula = Height ~ 1, data = Babies)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.4165  -4.2284  -0.2062   3.6744  13.5940 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  59.3953     0.5388   110.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.388 on 99 degrees of freedom
Babies$lm0 = predict(lm0, newdata = Babies)  #predict Height based on average value only
Babies$diff2 = Babies$lm0 - Babies$Height  #calculate difference between predicted (lm0) and observed values (Height)

We can plot these differences using ggplot.

ggplot() + geom_linerange(data = Babies, aes(x = Age, ymin = Height, ymax = lm0,
    colour = diff2), size = 1.2) + geom_line(data = Babies, aes(x = Age, y = lm0),
    size = 1.2) + geom_point(data = Babies, aes(Age, y = Height), size = 2) + ylab("Height") +
    xlab("Age") + ggtitle("SS_total")

The R2 coefficient is:

1 - (sum(Babies$diff^2)/(sum(Babies$diff2^2)))
## [1] 0.04821098

Improvement in our prediction:

Babies$diff3 = Babies$lm1 - Babies$lm0  #Improvement based on the inclusion of Age as a predictor - differences between the predicted values from (lm1) and intercept only model (lm0)
ggplot() + geom_linerange(data = Babies, aes(x = Age, ymin = lm1, ymax = lm0, colour = diff3),
    size = 1.2) + geom_line(data = Babies, aes(x = Age, y = lm0), size = 1.2) + geom_line(data = Babies,
    aes(Age, y = lm1), size = 1.3, linetype = "dotdash") + geom_point(data = Babies,
    aes(x = Age, y = Height), size = 0.9) + ylab("Height") + xlab("Age") + ggtitle("Improvement") +
    theme(axis.title = element_text(size = 14), axis.text = element_text(size = 12))

F-value:

(sum(Babies$diff3^2)/1)/(sum(Babies$diff^2)/98)
## [1] 4.963995

Practical aspect of the lecture

inequality <- read.table("inequality.txt", sep = ",", header = T)  #Reading the data in R
head(inequality)
##        state median_household_income share_unemployed_seasonal
## 1    Alabama                   42278                     0.060
## 2     Alaska                   67629                     0.064
## 3    Arizona                   49254                     0.063
## 4   Arkansas                   44922                     0.052
## 5 California                   60487                     0.059
## 6   Colorado                   60940                     0.040
##   share_population_in_metro_areas share_population_with_high_school_degree
## 1                            0.64                                    0.821
## 2                            0.63                                    0.914
## 3                            0.90                                    0.842
## 4                            0.69                                    0.824
## 5                            0.97                                    0.806
## 6                            0.80                                    0.893
##   share_non_citizen share_white_poverty gini_index share_non_white
## 1              0.02                0.12      0.472            0.35
## 2              0.04                0.06      0.422            0.42
## 3              0.10                0.09      0.455            0.49
## 4              0.04                0.12      0.458            0.26
## 5              0.13                0.09      0.471            0.61
## 6              0.06                0.07      0.457            0.31
##   share_voters_voted_trump hate_crimes_per_100k_splc
## 1                     0.63                0.12583893
## 2                     0.53                0.14374012
## 3                     0.50                0.22531995
## 4                     0.60                0.06906077
## 5                     0.33                0.25580536
## 6                     0.44                0.39052330
##   avg_hatecrimes_per_100k_fbi
## 1                   1.8064105
## 2                   1.6567001
## 3                   3.4139280
## 4                   0.8692089
## 5                   2.3979859
## 6                   2.8046888

Probability density plots: outcomes

par(mfrow = c(1, 2))
plot(density(inequality$hate_crimes_per_100k_splc, na.rm = T), main = "Crimes per 100k")  #probability density for hate crimes outcomes. na.rm=T indicates that only cases that are not NAs should be returned
plot(density(inequality$avg_hatecrimes_per_100k_fbi, na.rm = T), main = "Average Crimes")

Probability density plots: predictors

par(mfrow = c(1, 2))
plot(density(inequality$median_household_income, na.rm = T), main = "Income")
plot(density(inequality$gini_index, na.rm = T), main = "Gini")

Scatter plots:

par(mfrow = c(1, 2))
plot(inequality$median_household_income, inequality$avg_hatecrimes_per_100k_fbi,
    xlab = "Median household income", ylab = "Avg hatecrimes")
plot(inequality$gini_index, inequality$avg_hatecrimes_per_100k_fbi, xlab = "Gini index",
    ylab = "Avg hatecrimes")

cor(inequality[, c(2, 8, 12)], use = "complete.obs")  #calculate correlations where we use only rows that have all values (no NAs). We are only focusing on columns: 2,8 and 12. inequality is a data frame with (n rows, m columns), so we can access only first row by calling inequality[1,], or just first column by inequality[,1], first row and first column would be inequality[1,1]. When using inequality[,c(2,8,12)] am asking for all rows but only second, eight and twelfth column.
##                             median_household_income gini_index
## median_household_income                   1.0000000 -0.1497444
## gini_index                               -0.1497444  1.0000000
## avg_hatecrimes_per_100k_fbi               0.3182464  0.4212719
##                             avg_hatecrimes_per_100k_fbi
## median_household_income                       0.3182464
## gini_index                                    0.4212719
## avg_hatecrimes_per_100k_fbi                   1.0000000

Stepwise approach in building linear model

mod1 <- lm(avg_hatecrimes_per_100k_fbi ~ median_household_income, data = inequality)  #modelling avg hatecrimes as a function of median_household_income
summary(mod1)
## 
## Call:
## lm(formula = avg_hatecrimes_per_100k_fbi ~ median_household_income, 
##     data = inequality)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3300 -1.1183 -0.1656  0.7827  7.7762 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)  
## (Intercept)             -9.564e-01  1.448e+00  -0.661   0.5121  
## median_household_income  6.054e-05  2.603e-05   2.326   0.0243 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.642 on 48 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1013, Adjusted R-squared:  0.08256 
## F-statistic: 5.409 on 1 and 48 DF,  p-value: 0.0243

Adding a new predictor and comparing it with the previous model.

mod2 <- lm(avg_hatecrimes_per_100k_fbi ~ median_household_income + gini_index, data = inequality)
anova(mod2)  #anova type comparison of the model that allows us to see whether newly added predictor explained additional variation in the outcome. We can see changes in the Sum Sq for residuals and for the main effects
## Analysis of Variance Table
## 
## Response: avg_hatecrimes_per_100k_fbi
##                         Df Sum Sq Mean Sq F value    Pr(>F)    
## median_household_income  1 14.584  14.584  7.0649 0.0107093 *  
## gini_index               1 32.389  32.389 15.6906 0.0002517 ***
## Residuals               47 97.020   2.064                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can also test whether there is need to adjust influence of median household income across the values of gini index - interaction between our predictors

mod3 <- lm(avg_hatecrimes_per_100k_fbi ~ median_household_income * gini_index, data = inequality)
anova(mod1, mod2, mod3)  # comparison between three models
## Analysis of Variance Table
## 
## Model 1: avg_hatecrimes_per_100k_fbi ~ median_household_income
## Model 2: avg_hatecrimes_per_100k_fbi ~ median_household_income + gini_index
## Model 3: avg_hatecrimes_per_100k_fbi ~ median_household_income * gini_index
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     48 129.41                                  
## 2     47  97.02  1    32.389 19.742 5.539e-05 ***
## 3     46  75.47  1    21.550 13.135 0.0007218 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Summary of the model:

summary(mod3)
## 
## Call:
## lm(formula = avg_hatecrimes_per_100k_fbi ~ median_household_income * 
##     gini_index, data = inequality)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.16383 -0.96279 -0.01053  0.88008  2.75763 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         9.056e+01  3.070e+01   2.950 0.004986 ** 
## median_household_income            -1.724e-03  4.965e-04  -3.472 0.001136 ** 
## gini_index                         -2.001e+02  6.666e+01  -3.002 0.004330 ** 
## median_household_income:gini_index  3.907e-03  1.078e-03   3.624 0.000722 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.281 on 46 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.4759, Adjusted R-squared:  0.4417 
## F-statistic: 13.92 on 3 and 46 DF,  p-value: 1.367e-06

We can visualise the interactions using interactions package.

require(interactions)
## Loading required package: interactions
## Warning: package 'interactions' was built under R version 4.1.1
interact_plot(mod3, pred = median_household_income, modx = gini_index, plot.points = T)

Interactive visualisation:

p <- ggplot(simulatedD, aes(median_household_income, Avg_hate_pred, color = gini_index,
    frame = gini_index)) + geom_point()  # Using ggplot to make the plot
plotly::ggplotly(p)  #make it interactive using plotly

Model diagnostics

Quantile-quantile plot: Normality

car::qqPlot(mod3)

## [1]  9 18

Homoscedasticity: Linearity

car::residualPlot(mod3, type = "rstudent")  # we can call car package without loading it into R environment by calling car::

Outliers:

car::influenceIndexPlot(mod3)  #influence of individual observation on our model

Autocorrelation of residuals

par(bty = "n", mar = c(5, 4, 0.1, 0.1), cex = 1.1, pch = 16)
stats::acf(resid(mod3))

Predicted versus observed data:

par(bty = "n", mar = c(5, 4, 0.1, 0.1), cex = 1.1, pch = 16)
plot(predict(mod3), mod3$model$avg_hatecrimes_per_100k_fbi, xlab = "Predicted values (model 3)",
    ylab = "Observed values (avg hatecrimes)")  #The x-axis is predict(mod3) - predict values based on our model. The y-axis is the data that was used to build the model. I decided to take these values from our model frame (mod3), instead of original data frame (inequalities). 

Finally, we can subset our model and exclude data point that might skew our results

mod3.cor <- lm(avg_hatecrimes_per_100k_fbi ~ median_household_income * gini_index,
    data = inequality, subset = -9)  #We subset our data to exclude row 9
summary(mod3.cor)
## 
## Call:
## lm(formula = avg_hatecrimes_per_100k_fbi ~ median_household_income * 
##     gini_index, data = inequality, subset = -9)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.90964 -0.94966 -0.08526  0.73470  2.55257 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)                         4.429e+01  3.080e+01   1.438    0.157
## median_household_income            -7.992e-04  5.228e-04  -1.529    0.133
## gini_index                         -9.668e+01  6.725e+01  -1.438    0.157
## median_household_income:gini_index  1.837e-03  1.145e-03   1.604    0.116
## 
## Residual standard error: 1.154 on 45 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1285, Adjusted R-squared:  0.07041 
## F-statistic: 2.212 on 3 and 45 DF,  p-value: 0.09974

Lecture 3

Introduction

Adding more variables to our data. It just keeps growing:

library(truncnorm)
require(lavaan)
set.seed(456)
Babies = data.frame(Age = round(runif(100, 1, 30)), Weight = rnorm(100, 4000, 500))
Babies$Height = rnorm(100, 40 + 0.2 * Babies$Age + 0.004 * Babies$Weight, 5)
Babies$Sex = rbinom(100, 1, 0.5)
Babies$Nutrition = rtruncnorm(n = 100, a = 0, b = 30, mean = 5, sd = 10)
Babies$PhyExer = rnorm(100, 180, 50)
Babies$GMA = rnorm(100, 180, 50)
Babies$SocialBeh = rnorm(100, 180 + Babies$PhyExer, 80)
Babies$TummySleep = rbinom(100, 1, 0.5)
Babies$CognitiveAb = rnorm(100, 10 + 7 * Babies$Nutrition + 0.1 * Babies$PhyExer +
    3 * Babies$GMA + 0.03 * Babies$PhyExer * Babies$SocialBeh, 5)
Babies$Sex = as.factor(Babies$Sex)
levels(Babies$Sex) = c("Girls", "Boys")

Fitting the model:

# install.packages('lavaan') - we need lavaan package to fit structural
# equation models, both path models and confirmatory factor analysis
require(lavaan)
# model has to be written in a separate object. We have quotation marks ('') to
# indicate that this is a textual input.
modelAbility <- "
SocialBeh~Nutrition+PhyExer+GMA
CognitiveAb~SocialBeh+Nutrition+GMA
"
# we use sem function to fit the model, while we specify that data is in the
# dataset Babies
fit1 <- sem(modelAbility, data = Babies)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some observed variances are larger than 1000000
##   lavaan NOTE: use varTable(fit) to investigate
summary(fit1)
## lavaan 0.6-9 ended normally after 46 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               215.236
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate    Std.Err   z-value  P(>|z|)
##   SocialBeh ~                                            
##     Nutrition           0.030     1.105    0.027    0.978
##     PhyExer             1.281     0.146    8.796    0.000
##     GMA                -0.075     0.151   -0.500    0.617
##   CognitiveAb ~                                          
##     SocialBeh           9.579     0.469   20.428    0.000
##     Nutrition           7.019     6.899    1.017    0.309
##     GMA                 2.461     0.941    2.614    0.009
## 
## Variances:
##                    Estimate    Std.Err   z-value  P(>|z|)
##    .SocialBeh        5515.809   780.053    7.071    0.000
##    .CognitiveAb    215129.015 30423.837    7.071    0.000

We can also plot the estimates of our model:

# install.packages('tidySEM')
require("tidySEM")
graph_sem(fit1, variance_diameter = 0.2)

Indices of the model fit:

summary(fit1, fit.measures = TRUE)  # we need to include fit.measures=TRUE for lavaan to print them out
## lavaan 0.6-9 ended normally after 46 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               215.236
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               438.108
##   Degrees of freedom                                 7
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.503
##   Tucker-Lewis Index (TLI)                      -2.479
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1328.506
##   Loglikelihood unrestricted model (H1)      -1220.888
##                                                       
##   Akaike (AIC)                                2673.012
##   Bayesian (BIC)                              2693.853
##   Sample-size adjusted Bayesian (BIC)         2668.587
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          1.464
##   90 Percent confidence interval - lower         1.303
##   90 Percent confidence interval - upper         1.632
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.080
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate    Std.Err   z-value  P(>|z|)
##   SocialBeh ~                                            
##     Nutrition           0.030     1.105    0.027    0.978
##     PhyExer             1.281     0.146    8.796    0.000
##     GMA                -0.075     0.151   -0.500    0.617
##   CognitiveAb ~                                          
##     SocialBeh           9.579     0.469   20.428    0.000
##     Nutrition           7.019     6.899    1.017    0.309
##     GMA                 2.461     0.941    2.614    0.009
## 
## Variances:
##                    Estimate    Std.Err   z-value  P(>|z|)
##    .SocialBeh        5515.809   780.053    7.071    0.000
##    .CognitiveAb    215129.015 30423.837    7.071    0.000

Model modifications:

modelAbility2 <- "
SocialBeh~Nutrition+PhyExer+GMA
CognitiveAb~SocialBeh+Nutrition+GMA+PhyExer
"
fit2 <- sem(modelAbility2, data = Babies)
summary(fit2, fit.measures = TRUE)
## lavaan 0.6-9 ended normally after 56 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         9
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Model Test Baseline Model:
## 
##   Test statistic                               438.108
##   Degrees of freedom                                 7
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1220.888
##   Loglikelihood unrestricted model (H1)      -1220.888
##                                                       
##   Akaike (AIC)                                2459.776
##   Bayesian (BIC)                              2483.222
##   Sample-size adjusted Bayesian (BIC)         2454.798
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value RMSEA <= 0.05                             NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate   Std.Err  z-value  P(>|z|)
##   SocialBeh ~                                          
##     Nutrition          0.030    1.105    0.027    0.978
##     PhyExer            1.281    0.146    8.796    0.000
##     GMA               -0.075    0.151   -0.500    0.617
##   CognitiveAb ~                                        
##     SocialBeh          5.701    0.213   26.781    0.000
##     Nutrition          8.548    2.352    3.634    0.000
##     GMA                2.814    0.321    8.764    0.000
##     PhyExer           11.390    0.413   27.577    0.000
## 
## Variances:
##                    Estimate   Std.Err  z-value  P(>|z|)
##    .SocialBeh       5515.809  780.053    7.071    0.000
##    .CognitiveAb    24999.990 3535.532    7.071    0.000

Model comparison:

lavTestLRT(fit1, fit2)  # we are comparing fit of the first and the second model
## Chi-Squared Difference Test
## 
##      Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
## fit2  0 2459.8 2483.2   0.00                                  
## fit1  1 2673.0 2693.8 215.24     215.24       1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check the modification indices:

modindices(fit1, sort = TRUE)  #we are checking them for the first model, high values indicate potential pathways that we could include and improve the model fit
##            lhs op         rhs     mi    epc sepc.lv sepc.all sepc.nox
## 17 CognitiveAb  ~     PhyExer 88.379 11.390  11.390    0.553    0.011
## 16   SocialBeh  ~ CognitiveAb 88.379 -0.228  -0.228   -2.420   -2.420
## 23     PhyExer  ~ CognitiveAb 82.143  0.128   0.128    2.635    2.635
## 27         GMA  ~ CognitiveAb  1.601  0.025   0.025    0.529    0.529
## 19   Nutrition  ~ CognitiveAb  1.002  0.007   0.007    1.114    1.114
## 18   Nutrition  ~   SocialBeh  0.000  0.000   0.000    0.000    0.000
## 22     PhyExer  ~   SocialBeh  0.000  0.000   0.000    0.000    0.000
## 26         GMA  ~   SocialBeh  0.000  0.000   0.000    0.000    0.000
## 21   Nutrition  ~         GMA  0.000  0.000   0.000    0.000    0.000
## 25     PhyExer  ~         GMA  0.000  0.000   0.000    0.000    0.000
## 13     PhyExer ~~         GMA  0.000  0.000   0.000       NA    0.000
## 29         GMA  ~     PhyExer  0.000  0.000   0.000    0.000    0.000
## 10   Nutrition ~~     PhyExer  0.000  0.000   0.000       NA    0.000
## 24     PhyExer  ~   Nutrition  0.000  0.000   0.000    0.000    0.000
## 20   Nutrition  ~     PhyExer  0.000  0.000   0.000    0.000    0.000

Calculation of the indirect pathways:

modelAbilityPath <- "
SocialBeh~Nutrition+a*PhyExer+GMA
CognitiveAb~b*SocialBeh+c*Nutrition+GMA

indirect := a*b
direct := c
total := indirect + direct
"
# multiplication sign (*) labels the parameter - assigns regression coefficient
# (PhyExer) to value a.  using := we can specify calculations in the sem. You
# can do any other type of calculaton also, eg. a^2

fitPath <- sem(modelAbilityPath, data = Babies)
summary(fitPath)
## lavaan 0.6-9 ended normally after 46 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               215.236
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate    Std.Err   z-value  P(>|z|)
##   SocialBeh ~                                            
##     Nutrition           0.030     1.105    0.027    0.978
##     PhyExer    (a)      1.281     0.146    8.796    0.000
##     GMA                -0.075     0.151   -0.500    0.617
##   CognitiveAb ~                                          
##     SocialBeh  (b)      9.579     0.469   20.428    0.000
##     Nutrition  (c)      7.019     6.899    1.017    0.309
##     GMA                 2.461     0.941    2.614    0.009
## 
## Variances:
##                    Estimate    Std.Err   z-value  P(>|z|)
##    .SocialBeh        5515.809   780.053    7.071    0.000
##    .CognitiveAb    215129.015 30423.837    7.071    0.000
## 
## Defined Parameters:
##                    Estimate    Std.Err   z-value  P(>|z|)
##     indirect           12.275     1.519    8.079    0.000
##     direct              7.019     6.899    1.017    0.309
##     total              19.294     7.074    2.727    0.006

PiecewiseSEM package

# install.packages('piecewiseSEM)
require(piecewiseSEM)
model1 <- psem(lm(SocialBeh ~ Nutrition + PhyExer + GMA, data = Babies), lm(CognitiveAb ~
    SocialBeh + Nutrition + GMA, data = Babies))  # combined linear regression functions
summary(model1, .progressBar = FALSE)
## 
## Structural Equation Model of model1 
## 
## Call:
##   SocialBeh ~ Nutrition + PhyExer + GMA
##   CognitiveAb ~ SocialBeh + Nutrition + GMA
## 
##     AIC      BIC
##  229.364   255.416
## 
## ---
## Tests of directed separation:
## 
##                Independ.Claim Test.Type DF Crit.Value P.Value    
##   CognitiveAb ~ PhyExer + ...      coef 95    26.8792       0 ***
## 
## Global goodness-of-fit:
## 
##   Fisher's C = 209.364 with P-value = 0 and on 2 degrees of freedom
## 
## ---
## Coefficients:
## 
##      Response Predictor Estimate Std.Error DF Crit.Value P.Value Std.Estimate
##     SocialBeh Nutrition   0.0300    1.1278 96     0.0266  0.9789       0.0020
##     SocialBeh   PhyExer   1.2814    0.1487 96     8.6187  0.0000       0.6604
##     SocialBeh       GMA  -0.0753    0.1538 96    -0.4899  0.6253      -0.0375
##   CognitiveAb SocialBeh   9.5792    0.4786 96    20.0156  0.0000       0.9023
##   CognitiveAb Nutrition   7.0193    7.0413 96     0.9969  0.3213       0.0447
##   CognitiveAb       GMA   2.4607    0.9607 96     2.5614  0.0120       0.1155
##      
##      
##   ***
##      
##   ***
##      
##     *
## 
##   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
## 
## ---
## Individual R-squared:
## 
##      Response method R.squared
##     SocialBeh   none      0.44
##   CognitiveAb   none      0.81

Practical aspect of the lecture

Getting the data

NBAPath <- read.table("NBApath.txt", sep = "\t", header = T)

Summary of the NBA data

summary(NBAPath)
##      TEAM                PCT            Player              Pos           
##  Length:3810        Min.   :0.1061   Length:3810        Length:3810       
##  Class :character   1st Qu.:0.3780   Class :character   Class :character  
##  Mode  :character   Median :0.5000   Mode  :character   Mode  :character  
##                     Mean   :0.4905                                        
##                     3rd Qu.:0.6098                                        
##                     Max.   :0.8902                                        
##       Age              GP             PER        
##  Min.   :18.00   Min.   : 1.00   Min.   :-13.10  
##  1st Qu.:23.00   1st Qu.:34.00   1st Qu.: 10.00  
##  Median :25.00   Median :61.00   Median : 12.80  
##  Mean   :26.05   Mean   :53.73   Mean   : 12.75  
##  3rd Qu.:29.00   3rd Qu.:77.00   3rd Qu.: 15.80  
##  Max.   :43.00   Max.   :82.00   Max.   : 35.20

Correlation matrix

cor(NBAPath[, c(2, 5:7)])  #correlation between second, fifth, and seventh variable in the NBAPath dataframe
##            PCT        Age         GP        PER
## PCT 1.00000000 0.14304325 0.08849459 0.07720633
## Age 0.14304325 1.00000000 0.05170204 0.03598025
## GP  0.08849459 0.05170204 1.00000000 0.45360129
## PER 0.07720633 0.03598025 0.45360129 1.00000000

Univariate plots

par(mfrow = c(1, 2), bty = "n", mar = c(5, 4, 0.1, 0.1), cex = 1.1, pch = 16)
plot(density(NBAPath$PER), main = "")
plot(density(NBAPath$PCT), main = "")

Bivariate plots

par(mfrow = c(1, 2), bty = "n", mar = c(5, 4, 0.1, 0.1), cex = 1.1, pch = 16)
plot(NBAPath$Age, NBAPath$PER)
plot(NBAPath$GP, NBAPath$PER)

Estimating the model

NBAmod1 <- "
GP~b*Age
PER~a*Age+c*GP

dir := a
ind := b*c
tot := dir + ind
"
NBAfit1 <- sem(NBAmod1, data = NBAPath)
summary(NBAfit1)
## lavaan 0.6-9 ended normally after 21 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         5
##                                                       
##   Number of observations                          3810
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   GP ~                                                
##     Age        (b)    0.315    0.098    3.196    0.001
##   PER ~                                               
##     Age        (a)    0.016    0.018    0.869    0.385
##     GP         (c)    0.093    0.003   31.333    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .GP              645.883   14.798   43.646    0.000
##    .PER              21.834    0.500   43.646    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     dir               0.016    0.018    0.869    0.385
##     ind               0.029    0.009    3.179    0.001
##     tot               0.045    0.020    2.222    0.026

R2 and other measures of fit

inspect(NBAfit1, "r2")
##    GP   PER 
## 0.003 0.206
-2 * logLik(NBAfit1)  # deviance 
## 'log Lik.' 58025.67 (df=5)
AIC(NBAfit1)  # Akaike information criterion
## [1] 58035.67

Respecification of the model

NBAmod2 <- "
GP~b*Age
PER~c*GP

ind := b*c
"
NBAfit2 <- sem(NBAmod2, data = NBAPath)
summary(NBAfit2, fit.measures = T)
## lavaan 0.6-9 ended normally after 21 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         4
##                                                       
##   Number of observations                          3810
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.755
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.385
## 
## Model Test Baseline Model:
## 
##   Test statistic                               888.633
##   Degrees of freedom                                 3
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.001
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -29013.211
##   Loglikelihood unrestricted model (H1)     -29012.833
##                                                       
##   Akaike (AIC)                               58034.422
##   Bayesian (BIC)                             58059.403
##   Sample-size adjusted Bayesian (BIC)        58046.693
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.041
##   P-value RMSEA <= 0.05                          0.987
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.005
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   GP ~                                                
##     Age        (b)    0.315    0.098    3.196    0.001
##   PER ~                                               
##     GP         (c)    0.093    0.003   31.417    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .GP              645.883   14.798   43.646    0.000
##    .PER              21.838    0.500   43.646    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     ind               0.029    0.009    3.179    0.001

Model comparison

# install.packages('semTools')
require(semTools)
diff <- compareFit(NBAfit1, NBAfit2)
summary(diff)
## ################### Nested Model Comparison #########################
## Chi-Squared Difference Test
## 
##         Df   AIC   BIC Chisq Chisq diff Df diff Pr(>Chisq)
## NBAfit1  0 58036 58067 0.000                              
## NBAfit2  1 58034 58059 0.755      0.755       1     0.3849
## 
## ####################### Model Fit Indices ###########################
##         chisq df pvalue rmsea    cfi    tli  srmr        aic        bic
## NBAfit1 .000†        NA .000† 1.000† 1.000  .000† 58035.667  58066.894 
## NBAfit2 .755   1   .385 .000† 1.000† 1.001† .005  58034.422† 58059.403†
## 
## ################## Differences in Fit Indices #######################
##                   df rmsea cfi   tli  srmr    aic   bic
## NBAfit2 - NBAfit1  1     0   0 0.001 0.005 -1.245 -7.49

Respecification of the model 2

NBAmod3 <- "
GP~b*Age
PER~a*Age+c*GP
PCT~d*PER
ind1 := b*c*d
ind2 := a*d
tot := ind1 + ind2
"
NBAfit3 <- sem(NBAmod3, data = NBAPath)
summary(NBAfit3, fit.measures = T)
## lavaan 0.6-9 ended normally after 30 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         7
##                                                       
##   Number of observations                          3810
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                87.884
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               999.296
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.914
##   Tucker-Lewis Index (TLI)                       0.741
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -27272.876
##   Loglikelihood unrestricted model (H1)     -27228.934
##                                                       
##   Akaike (AIC)                               54559.752
##   Bayesian (BIC)                             54603.469
##   Sample-size adjusted Bayesian (BIC)        54581.227
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.106
##   90 Percent confidence interval - lower         0.088
##   90 Percent confidence interval - upper         0.126
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.047
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   GP ~                                                
##     Age        (b)    0.315    0.098    3.196    0.001
##   PER ~                                               
##     Age        (a)    0.016    0.018    0.869    0.385
##     GP         (c)    0.093    0.003   31.333    0.000
##   PCT ~                                               
##     PER        (d)    0.002    0.000    4.780    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .GP              645.883   14.798   43.646    0.000
##    .PER              21.834    0.500   43.646    0.000
##    .PCT               0.023    0.001   43.646    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     ind1              0.000    0.000    2.647    0.008
##     ind2              0.000    0.000    0.855    0.393
##     tot               0.000    0.000    2.015    0.044

Parameter estimates

parameterestimates(NBAfit3, boot.ci.type = "bca.simple", standardized = T)
##     lhs op       rhs label     est     se      z pvalue ci.lower ci.upper
## 1    GP  ~       Age     b   0.315  0.098  3.196  0.001    0.122    0.507
## 2   PER  ~       Age     a   0.016  0.018  0.869  0.385   -0.020    0.051
## 3   PER  ~        GP     c   0.093  0.003 31.333  0.000    0.087    0.099
## 4   PCT  ~       PER     d   0.002  0.000  4.780  0.000    0.001    0.003
## 5    GP ~~        GP       645.883 14.798 43.646  0.000  616.879  674.887
## 6   PER ~~       PER        21.834  0.500 43.646  0.000   20.853   22.814
## 7   PCT ~~       PCT         0.023  0.001 43.646  0.000    0.022    0.025
## 8   Age ~~       Age        17.498  0.000     NA     NA   17.498   17.498
## 9  ind1 :=     b*c*d  ind1   0.000  0.000  2.647  0.008    0.000    0.000
## 10 ind2 :=       a*d  ind2   0.000  0.000  0.855  0.393    0.000    0.000
## 11  tot := ind1+ind2   tot   0.000  0.000  2.015  0.044    0.000    0.000
##     std.lv std.all std.nox
## 1    0.315   0.052   0.012
## 2    0.016   0.013   0.003
## 3    0.093   0.453   0.453
## 4    0.002   0.077   0.077
## 5  645.883   0.997   0.997
## 6   21.834   0.794   0.794
## 7    0.023   0.994   0.994
## 8   17.498   1.000  17.498
## 9    0.000   0.002   0.000
## 10   0.000   0.001   0.000
## 11   0.000   0.003   0.001
# adding bca.simple we are getting bootstrapped intervals using adjusted
# bootstrap percentile method

Bootstrapping our model

# install.packages('bootstrap')
require(bootstrap)
boot <- bootstrapLavaan(NBAfit3, R = 1000)
summary(boot)
##        b                  a                  c                 d            
##  Min.   :0.001798   Min.   :-0.04508   Min.   :0.08161   Min.   :0.0009244  
##  1st Qu.:0.248816   1st Qu.: 0.00456   1st Qu.:0.09033   1st Qu.:0.0019244  
##  Median :0.316568   Median : 0.01673   Median :0.09315   Median :0.0022315  
##  Mean   :0.315661   Mean   : 0.01603   Mean   :0.09319   Mean   :0.0022347  
##  3rd Qu.:0.382383   3rd Qu.: 0.02691   3rd Qu.:0.09579   3rd Qu.:0.0025295  
##  Max.   :0.715760   Max.   : 0.07098   Max.   :0.11068   Max.   :0.0037406  
##      GP~~GP         PER~~PER        PCT~~PCT      
##  Min.   :618.1   Min.   :19.56   Min.   :0.02219  
##  1st Qu.:638.5   1st Qu.:21.29   1st Qu.:0.02320  
##  Median :645.8   Median :21.84   Median :0.02350  
##  Mean   :645.7   Mean   :21.85   Mean   :0.02351  
##  3rd Qu.:653.0   3rd Qu.:22.39   3rd Qu.:0.02379  
##  Max.   :678.4   Max.   :25.29   Max.   :0.02488
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252 
## [2] LC_CTYPE=English_United Kingdom.1252   
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bootstrap_2019.6   semTools_0.5-5     piecewiseSEM_2.1.2 tidySEM_0.1.10    
## [5] lavaan_0.6-9       truncnorm_1.0-8    interactions_1.1.5 ggplot2_3.3.5     
## 
## loaded via a namespace (and not attached):
##   [1] TH.data_1.1-0         minqa_1.2.4           colorspace_2.0-2     
##   [4] ellipsis_0.3.2        rio_0.5.27            estimability_1.3     
##   [7] rstudioapi_0.13       farver_2.1.0          fansi_0.5.0          
##  [10] mvtnorm_1.1-2         codetools_0.2-18      splines_4.1.0        
##  [13] mnormt_2.0.2          knitr_1.37            texreg_1.37.5        
##  [16] jsonlite_1.7.2        nloptr_1.2.2.2        DiagrammeR_1.0.6.1   
##  [19] compiler_4.1.0        httr_1.4.2            emmeans_1.7.0        
##  [22] backports_1.2.1       assertthat_0.2.1      Matrix_1.3-3         
##  [25] lazyeval_0.2.2        cli_3.1.0             formatR_1.11         
##  [28] visNetwork_2.1.0      htmltools_0.5.1.1     tools_4.1.0          
##  [31] igraph_1.2.6          OpenMx_2.19.8         coda_0.19-4          
##  [34] gtable_0.3.0          glue_1.4.2            dplyr_1.0.7          
##  [37] Rcpp_1.0.7            carData_3.0-4         cellranger_1.1.0     
##  [40] jquerylib_0.1.4       vctrs_0.3.8           nlme_3.1-152         
##  [43] crosstalk_1.1.1       psych_2.1.9           xfun_0.29            
##  [46] stringr_1.4.0         proto_1.0.0           openxlsx_4.2.4       
##  [49] lme4_1.1-27.1         lifecycle_1.0.1       MASS_7.3-54          
##  [52] zoo_1.8-9             scales_1.1.1          hms_1.1.0            
##  [55] parallel_4.1.0        sandwich_3.0-1        RColorBrewer_1.1-2   
##  [58] yaml_2.2.1            curl_4.3.2            pander_0.6.4         
##  [61] sass_0.4.0            stringi_1.7.5         highr_0.9            
##  [64] fastDummies_1.6.3     checkmate_2.0.0       boot_1.3-28          
##  [67] zip_2.2.0             rlang_0.4.11          pkgconfig_2.0.3      
##  [70] evaluate_0.14         lattice_0.20-44       purrr_0.3.4          
##  [73] htmlwidgets_1.5.3     labeling_0.4.2        tidyselect_1.1.1     
##  [76] plyr_1.8.6            magrittr_2.0.1        R6_2.5.1             
##  [79] generics_0.1.0        multcomp_1.4-17       DBI_1.1.1            
##  [82] gsubfn_0.7            pillar_1.6.4          haven_2.4.1          
##  [85] foreign_0.8-81        withr_2.4.2           jtools_2.1.4         
##  [88] survival_3.2-11       abind_1.4-5           tibble_3.1.5         
##  [91] crayon_1.4.2          car_3.0-11            utf8_1.2.2           
##  [94] tmvnsim_1.0-2         plotly_4.10.0         MplusAutomation_1.0.0
##  [97] rmarkdown_2.11        grid_4.1.0            readxl_1.3.1         
## [100] data.table_1.14.0     pbivnorm_0.6.0        forcats_0.5.1        
## [103] digest_0.6.27         xtable_1.8-4          tidyr_1.1.3          
## [106] RcppParallel_5.1.4    stats4_4.1.0          munsell_0.5.0        
## [109] viridisLite_0.4.0     bslib_0.2.5.1